Submission to metric track

Introduction

Deciding between man and zone coverage is one of the most critical strategic decisions a defensive coordinator must take before each offensive play in American football. While experienced offensive coordinators and quarterbacks often rely on visual cues to identify these defensive schemes, the increasing availability of player tracking data offers a new avenue to uncover these tactics. A notable example is Amazon’s NFL Next Gen Stats model, which delivers coverage predictions during live broadcasts (see snapshot below). However, pre-snap motion does not seem to play an accentuated role in this model (see Amazon), although it is a crucial element of modern offensive strategies.

Hence, our contribution explores the potential of incorporating pre-snap motion. While we similarly predict man- or zone coverage before motion, we further leverage the additional information of pre-snap player movements. Specifically, in addition to including rather naive post-motion features, we use an HMM to model defenders’ trajectories based on hidden states, which represent the offensive players they may be guarding. Incorporating summary statistics of the state decoding results as features into the existing models substantially improves the predictive ability. This lays the groundwork for further analyses such as the evaluation of the effectiveness of pre-snap motion in uncovering defensive strategies.

Coverage Prediction

Data

We aim to forecast the man- or zone coverage using the pff_passCoverage indicator in play-by play data. We omit plays tagged as others and plays with more than five offensive linemen and with two quarterbacks. Since we are specifically interested in analyzing pre-snap player movements, we concentrate on plays that contain any pre-snap motion. Ultimately, we end up with \(3985\) offensive plays in total, from which the defense played \(2973\) in zone and \(1012\) in man coverage.

Feature engineering

To accurately forecast the defensive scheme, we create various features derived from the tracking data. In particular, we conducted the following feature engineering steps: First, using all 11 players on each side, we compute the area spanned by the convex hull of a team and the largest \(y\) distance (i.e., the width of the hull) and the largest \(x\) distance (i.e., the length of the hull), i.e. 6 features. Then, we select the five most relevant players on each side of the field. For offense, we omit the offensive line and the QB and for defense, we disregard defense lineman (NT, DT, DE) and select the five defenders that were the closest to the five offensive players, corresponding to a weighted euclidean distance, putting much more emphasis on the y-axis. From these 10 players, we derive 20 features related to their (standardized) position.

Additionally, we extract information from the play-by-play data, such as quarter, down, yards to go, home and away score and the remaining seconds in the current half. See the Appendix for a more detailed description and a discussion on the choice of features.

Analysis

We train different models to predict whether the defense plays a man- or zone coverage scheme. Since the aim of the project is to show the effectiveness of pre-snap motion, we follow a three-step approach:

  1. Pre-motion models
  2. Naive post-motion models
  3. HMM post-motion models

In general, we have a limited dataset available (only 3985 plays) and therefore need to manage model complexity by controlling the number of features. Given the small dataset, we focus on the 32 previously described basic features.

Regarding a suitable basic model class for predicting man or zone coverage, we opt for the following two: First, we fit a glmnet (elastic net) model, which performs implicit feature selection and can handle multicollinearity. Second, we use an xgboost model, which additionally captures non-linear effects (and interactions). For these models, we use 10-fold cross validation on a suitable hyperparameter grid.

1. Pre-motion models

First, we fit the aforementioned models with the previously described basic pre-motion features. These very basic models serve as baseline models that allows to measure the effect of pre-snap motion (features) in the following.

2. Naive post-motion models

Second, we extend our basic pre-motion model with naive post-motion features. To keep the complexity manageable, we derive only six additional post-motion features: for each team (offense and defense), we infer the maximum \(y\)-distance, the maximum \(x\)-distance and the total distance traveled by both teams until the snap.

3. HMM post-motion models

3.1 Hidden Markov model

The previously described models primarily serve as baselines for evaluating the impact of incorporating pre-snap motion information into predicting defensive schemes. The main contribution of our project lies in the effective integration of this information using an HMM.

Specifically, we use results derived from an HMM fitted to the pre-snap motion tracking data as additional features in the previously described models. To achieve this, we model the movements of the five defensive players during pre-snap motion by an HMM (see the Appendix for an in-depth description). In particular, we assume that each defender’s \(y\)-coordinate at each time point \(t\) is a realization from a Gaussian distribution with mean aligned to the \(y\)-coordinate of the offensive player they are guarding and an estimated standard deviation. HMMs are particularly well-suited for this task since the guarding assignments are not directly observed but need to be inferred from the defenders’ responses to the offenders’ movements. Hence, the model naturally treats this information as a latent state variable, making each observation a realization from a mixture of Gaussian distributions (see Franks et al. 2015 for a similar approach in basketball). The ultimate goal is to use the fitted model for state-decoding, i.e., inferring information on the guarding assignment based on the observations made. HMMs excel in this context as they leverage not only the defenders’ \(y\)-coordiate and the \(y\)-coordiate of all offenders potentially guarded at the current time point, but also incorporate probabilistic information based on the previous and subsequent observations. State-decoding can either provide us with fully probabilistic information, i.e., the probability of each defender guarding each offensive player at each time point, or a single most likely guarding assignment for each defender at each time point. We will use the former to enhance our post-motion models and the latter for visualization purposes. For details on model fitting and state decoding, we also refer to the Appendix.

The results of the HMM are exemplified using the following video and animation. They display a touchdown of the Kansas City Chiefs against the Arizona Cardinals in Week 1 of the 2022 NFL season. We can see that, pre-snap, Mecole Hardman (KC #17) is in motion. He is immediately followed by the defender Marco Wilson (AZ #20), which is a clear indication for man-coverage.

After state decoding using the fitted HMM, we can visualize the inferred guarding assignment as depicted below. It becomes evident that the HMM effectively captures Marco Wilson’s guarding assignment. The unique strength of HMMs is apparent when Marco Wilson reaches the same \(y\)-coordinate as the running back, Jerick McKinnon (KC #1). At this point, clustering algorithms that disregard the temporal component would briefly suggest a change in the guarding assignment, introducing noise into summary statistics such as the total number of switches predicted by the model. In contrast, the HMM consistently and accurately maintains the correct coverage assignment throughout the motion as a consequence of its temporal persistence.

Coverage Prediction according to the HMM.

3.2 Enhanced post-motion model

Third, we re-train the models from above, now integrating HMM results as additional features. A caveat of the aforementioned decoded state probabilities is that they form a multi-dimensional time series, complicating their direct inclusion as features in the previously described models. This challenge can however be remedied by employing suitable summary statistics. To achieve this, we first calculate for each defender the most likely offensive player \(k = 1, \ldots, 5\) to be guarded in each time point \(t = 1, \ldots, T_n\), where \(T_n\) is the length of the \(n\)-th play, and count the number of state switches, i.e. the number of times for which a defender’s most likely guarded offensive players differs between two consecutive time points \(t\) and \(t+1\). From this, we can calculate some simple summary statistics, specifically 1) the sum of state switches, 2) the average number of state switches, and 3) the number of defenders that switch offensive players during a play. Additionally, we use the aforementioned decoded state probabilities to calculate a more elaborated statistic, the mean entropy across defenders \(j = 1, \ldots, 5\) in each offensive play \(n = 1,\ldots, N\):

\[H(n) = - \frac{1}{5}\sum_{j = 1}^5 \sum_{k=1}^{5} \left( \frac{1}{T_n} \sum_{t=1}^{T_n} \mathbb{1}\left(\arg\max_{i=1,\ldots,5} X_{t,i} = k\right) \cdot \log\left(\frac{1}{T_n} \sum_{t=1}^{T_n} \mathbb{1}\left(\arg\max_{i=1,\ldots,5} X_{t,i} = k\right)\right) \right)\]

As the entropy is a measure of uncertainty in a probability distribution, lower (higher) entropy indicates greater (un)predictability. Here, we suspect that higher (lower) entropy values are associated with less (more stable) persistent guarding allocations.

Results

Model and feature evaluation

To assess whether the incorporation of pre-snap player movements can improve the prediction of defensive schemes, we evaluate the predictive performance of our three models using a train-test split of 85/15. To ensure robust and reliable results, we repeat the evaluation process 50 times, using different CV splits for each iteration. For each split, we train and fine-tune the model on the training data and then evaluate its performance on the test set. This approach helps minimize the risk of our findings being influenced by randomness, which is particularly important given the limited size of our dataset. The results of the evaluation are shown in the figures below.

The first figure illustrates the classification accuracy, computed using a threshold of 50%. Since this value is somewhat arbitrary and our dataset is slightly imbalanced (\(\approx\) 75% zone and \(\approx\) 25% man coverage plays), accuracy is a suboptimal metric (see Appendix for a discussion on evaluation metrics). Hence, we also present the AUC values of the three models in the second figure. Several observations emerge from these metrics. First, the more flexible xgboost model outperforms glmnet, but also exhibits greater variation, indicating that additional data could enhance the tuning of this model. Second, the pre-motion model clearly performs worst. Once motion information are incorporated, we observe a substantial improvement in performance. While this is to be expected, importantly, we also see that the HMM post-motion model outperforms the model only using naive post-motion information. This indicates that the features derived from the HMM’s state decoding capture distinct information beyond simple motion features, providing an additional dimension that enhances the overall analysis.

Team analyses

Our approach is not limited to the mere prediction of coverages; it can also be applied to a variety of analyses. As an illustrative example, we focuse on examining how pre-snap motion has helped teams, according to our model, identify the correct defensive coverage. Specifically, we calculate pre- and HMM post-motion probabilities for each team and evaluated the extent to which these models accurately predicted the actual coverages.

As shown in the following table, teams that are well-known for extensive use of pre-snap motion — such as the Niners and Dolphins — are also among the ones that most frequently increased the likelihood of correctly decoding the defensive coverage. While some teams, such as the Giants or Titans, succeed in accurately identifying coverages despite employing limited motion, the general trend indicates that teams utilizing extensive motion are more effective at leveraging it to recognize defensive schemes.

Discussion

This project explored the use of HMMs to enhance the prediction of defensive schemes by incorporating pre-snap player movements. Models augmented with HMM-derived features demonstrated improved predictive performance compared to those relying solely on naive motion data.

While our pre- and post-motion models are limited due to small sample size available, our HMM approach is modular and therefore our models can be seamlessly replaced by other models, potentially more complex ones in the case of a richer data set. Combining both of these worlds could fully leverage the insights provided by the present data.

Code

All code for data pre-processing, model training, prediction and player evaluation can be found here.

References

*Franks A, Miller A, Bornn L, Goldsberry K (2015). Characterizing the Spatial Structure of Defensive Skill in Professional Basketball. The Annals of Applied Statistics, 9(1), DOI:10.1214/14-AOAS799

*Koslik J (2024). LaMa: Fast Numerical Maximum Likelihood Estimation for Latent Markov Models. R package version 2.0.2, https://CRAN.R-project.org/package=LaMa.

*Zucchini W, MacDonald I, Langrock R (2016). Hidden Markov Models for Time Series - An Introduction Using R. CRC Press

Appendix

Feature engineering

Prior to more involved feature engeneering steps, we transform the coordinate system by redefining the x-variable as the x-distance to the endzone (such that all play directions are from right to left and the relevant endzone is at zero), and changing the direction variable, such that zero degrees represents heading straight towards the corresponding endzone. As mentioned in the main text, we use player features from 5 defensive and 5 offensive players. First, we standardized their \(x\)- and \(y\)-coordinates with respect to the football and ordered the players according to their \(y\)-coordinates, i.e., the first defender in our dataset is always the leftmost defensive player, while the first offensive player is always the rightmost one (offensive play direction is from left to right). Furthermore, for each player we compute distances to the football and their orientation with respect to the quarterback.

Model features and comparison

As frequently mentioned, we face the problem of only having a small amount of relevant data available. To avoid overfitting problems, we therefore focus on a basic feature set of 32 variables for our main results: 6 convex hull related features (3 for offense and defense, respectively), 20 player positions features (10 standardized \(x\) and 10 standardized \(y\) coordinates, 5 for offense, 5 for defense), and 6 play-by-play features. However, using more thoroughly crafted features such as distances and orientation as described above, we can enlarge the feature set to 67 total variables: 30 distance variables (for each of the 10 relevant players the total distance, \(x\)-distances, and \(y\)-distances to the football), and 5 orientation variables (for each defender the orientation with respect to the QB).

In the following, we provide a comparison of these enlarged models and the smaller ones used for our main results. When comparing results of a binary outcome variable, accuracy can be a misleading metric, since it is dependent on the threshold selected for classification of the outcome. Usually, for probabilistic predictions as obtained by the two model classes considered, one uses the naive threshold of \(0.5\). However doing so is arbitrary and especially for imbalanced data, changing the threshold may change the results of the accuracy drastically. AUC (or area under the ROC curve) on the other hand, tries to evaluated the performance of a model over a suitable grid of thresholds, and thus avoids the subjective choice of threshold. Another popularly used metric is the logloss (or negative log-likelihood loss), which is often a preferred choice due to being a proper scoring rule. While using the logloss as evaluation metric is mathematically the best option, interpretation is not as intuitive. Specifically, whether a logloss of a model can be considered good, depends on the (im)balance of the classes (as well as the number of classes, which in a our binary case is only 2). Without going into further detail, we therefore focus on these two metrics for the evaluation of the models.

We see that using the more pronounced set of features (models contain “AF” in the name) seems to provide more information. At least the glmnet model, which is more simple but known to be able to handle a high number of features well, the versions including a bigger set of features (orientations and distances) performs better than the corresponding smaller models. For the xgboost model, the model including all features performs on par with the one including only small set of basic features. This suggests, that with more data the model may still be improved.

Finally, we mention that we also varied the number of players used in the analyses. Presently, we use only information of 5 offensive and 5 defensive players. However, using more information resulted in worse performance of the models, thus we refrain from showing the results here.

Hidden Markov model

A hidden Markov model consists of an observed time series \(\{y_t\}_{t=1}^T\) and an unobserved first-order Markov chain \(\{ g_t\}_{t=1}^T\), with \(g_t \in \{1,\ldots,N\}\). In this case, at every time point \(t\), \(y_t\) is the y-coordinate of the defensive player and \(g_t\) proxies the offensives player to be guarded, i.e. the guarding assignment. The Markov chain is fully described by an initial distribution \(\boldsymbol{\delta}^{(1)} = \bigl( \Pr(g_1=1), \ldots, \Pr(g_1=N) \bigr)\) and a transition probability matrix (t.p.m.) \(\boldsymbol{\Gamma} = (\gamma_{ij})\), with \(\gamma_{ij} = \Pr(g_t = j \mid g_{t-1} = i), \ i,j = 1, \ldots, N\). The connection of both stochastic processes arises from the assumption that the distribution of the observations \(y_t\) are fully determined by the state that is currently active. More formally, \[\begin{equation*} f(y_t \mid g_1, \ldots, g_T, y_1, \ldots, y_{t-1},y_{t+1},\ldots,y_T) = f(y_t \mid g_t = j), \qquad j \in \{1, \ldots, N\}, \end{equation*}\] which we denote by \(f_j(y_t)\) in short. In general, \(f_j\) can be any density or probability mass function depending on the type of data and a typical choice is a parametric distribution with separate parameters for each latent state. Following the approaches of Franks et al. (2015), we opt for a Gaussian distribution with a mean that is fully determined by the current \(y\)-coordinate of defender \(j \in \{1, \dots, N\}\) and a standard deviation that is fixed across all states but estimated from the data.

To fit the model, we use direct numerical likelihood maximization. The HMM likelihood for the motion of a specific defender in a specific play can be calculated based on the so-called forward algorithm. It effectively performs a summation over all possible latent state sequences in an efficient manner, rendering the computational complexity linear in the number of observations. Time series of different defenders within the same play and of different plays are treated as independent, hence their likelihood contributions are summed to obtain the full likelihood of the training data. For practical implementation, we wrote a custom likelihood function in R, using the function forward() and other convenience functions from the R package LaMa (Koslik, 2024) to speed up computations. Furthermore, we used the R package RTMB to make this likelihood function compatible with automatic differentiation, making the numerical optimization process more efficient and robust. The parameters to be estimated are only the transition probability matrix \(\boldsymbol{\Gamma}\) and the standard deviation of the Gaussian distribution. The initial distribution of the guarding assignment for each defender in each play arises from a deterministic assignment approach, based on spatial proximity at the beginning of the time-series, i.e., the moment the line is set. Specifically, for each defender in the predefined set of defenders considered, we compute a weighted Euclidean distance (giving more weight to the \(y\)-coordinate) to the five offensive players that could potentially be guarded. The initial distribution is then set to 1 for the closest defender under this metric and 0 otherwise. While this approach proved to be sufficiently reliable in this application, it may be finetuned in future iterations.

Having fitted an HMM to the data, we can use the model to predict the underlying state sequence based on the observations. This process is called state decoding and two main approaches exist. So-called local decoding constructs the conditional distributions \[ \Pr(g_t = j \mid y_1, \dots, y_T) \] while global decoding using the Viterbi-algorithm finds the state sequence that maximizes the joint probability of the state sequence given the observations. Local decoding retains more probabilistic information as it provides a categorical state distribution for each time point, while global decoding is more suitable for visualization purposes as it provides a single state sequence that is most likely to have generated the observations. To obtain both the local state probabilities and the global state sequence, we used the functions stateprobs() and viterbi() that are also contained in the R package LaMa.